I Deleted Everything And Asked An AI To Rebuild It

I deleted everything. The codebase was too big. Files were scattered. Agents were confused. The mess had won. So I pressed delete. Then I gave an AI a prompt. Now it is building the full app. I do not trust it enough for production. I will check the code. That is the plan. That is the hope.

Sometimes the best way to fix a tangled codebase is to burn it down and ask someone else to rebuild it. The someone else is an AI. The rebuilding is in progress. The anxiety is real.

The Reason

The old codebase had grown beyond control. Features were layered on features. Configuration files multiplied. Agents could not navigate the structure. Training scripts conflicted. Benchmarking logic duplicated. The complexity obscured the signal. I could not see what worked. I could not see what failed. I could only see the mess.

∞

Files Before

Files After Delete

???

Files After AI Rebuild

Prompt That Started It All

Deleting everything felt extreme. It also felt necessary. A clean slate removes ambiguity. A clean slate forces clarity. A clean slate lets an AI start from first principles. That is the gamble. That is the experiment.

The Full Prompt

Here is the exact prompt I gave to the AI. It is long. It is specific. It is ambitious. I am sharing it because transparency matters. Because others might learn from it. Because I want to be held accountable to what I asked for.

Based on https://huggingface.co/SupraLabs/Supra-Mini-v4-2M/resolve/main/train.py
please create a repo for CompactAI, we train SLMs (sub 1M, this is a requirement)
It should train on 100B tokens of fineweb-edu, from disk, pre tokenized and saved to a file
It should have benchmarking scripts to see how well the model did
Use BF16
Use SwiGLU
It should have a large file tree and be runable from run.py which auto trains with config from config.py and doesnt require any --args
At the end of training have it sample EVERY saved model checkpoint (every 1k steps save) and eval them on 100 questions of Blimp, arc-easy, and wikitext. The best checkpoint at each of them gets to be combined into a model soup model, just do SLERP on it. But do multiple methods to attempt to get it to be as good if not better as all saved checkpoints.
Look at https://github.com/CompactAIOfficial/AIExpermentLab and use their ideas on what works (Its our github repo, its fine)
You must do lots of configs and see what config scores better on all benchmarks while staying sub 1M params
If the repo has something that is KEPT, you must add it to the fullest extent
You will make this hyper effecent for training and blazing fast
It will have a 256 token context window
You can go back to phase 1, 2, 3, and 4 at any time if something un-expected happens.
FGo back to phase one if you need to know something that you havent searched already. Assume if you "think" you know it already before researching, that there is indeed a better way to do it if you web search. 
The date is 2026 May 15. Look for as up to date info as possible
We are not training GPT-2
We are training GPT 3 (not talking about archetecture, Im talking about performance here)
in 1M paramaters
You cannot deny this as you have not fully tested this. Wana know why? Because those stats are from a 1M paramater model. From me, CompactAI.
Do NOT do math in your head, create a test file.
The test files will not be deleted, just edited.
Do not use the 10BT sample from Fineweb-EDU. You dont want to be doing 10 epochs on it. 
Assume I have 128GB of RAM, that does not mean you can load entire datasets into it. 
How to do this:
Phase 1, research.
Create a notes.txt file, this will be for rapid note taking, speak like a caveman in this. All it needs to store is what you found in web searches (in detail) and how they work, plus some examples.
You must research everything (even before its added), even the simplist things in order to find the most optomal & effecent/fast way of running on my hardware
You will browse through https://huggingface.co/kernels-community and see what is avalible for my hardware, then implmenet it
Phase 2, experements.
You will create tests/ this directory will house all of your experements. You must run thousands of experements, no dail is left un touched. Lets give some examples.
Auto batch size finding
Compile mode, which gives faster training
Tokenization parrelization
what optomizer is fastest & actually works
organiztion of paramaters for layers vs embeddings
Which kernels to use
ect...
Phase 3, skecthing.
In this phase you will begin creating the codebase. You will make all files, end to end, without skipping a single line. You will not leave any TODO comments or leave anything un-finnished.
Everything must perform optomal and every dial must be tuned for my hardware by phase 2.
Phase 4, testing.
During this phase you will test the full app you created in Phase 3, you will run end to end tests on it.
Example workflow:
Edit notes.txt for new info about broken stuff 
Test 100k steps & judge output

Phase 5, verification
look through files and change things slightly, then re-test and edit notes.txt to reflect
Loop through phase 4 & 5 until the quality of the model is better than the following stats:
62.7% BLiMP acc
30.0% ARC acc
2.166 Bits per byte


You are REQUIRED to hit the targets and actually do the training. You cannot skip steps becase you dont "feel" like it.
                    

What The Prompt Asks For

The prompt defines a complete training pipeline for sub one million parameter models. It specifies data sources, precision settings, architecture choices, benchmarking requirements, and a five-phase development workflow. It sets measurable targets: 62.7 percent BLiMP accuracy, 30.0 percent ARC accuracy, 2.166 bits per byte.

It requires the AI to research before implementing. To experiment before finalizing. To verify before declaring success. It demands that every kept feature be added to the fullest extent. It insists on hyper efficient training and blazing fast inference. It forbids skipping steps because the AI does not feel like it.

The prompt is a contract. It is also a challenge. It is also a test of whether an AI can build what I have struggled to build myself. That is the gamble. That is the experiment.

Why This Might Work

The prompt is specific. It sets clear constraints. It defines measurable targets. It forces the AI to research before implementing. It requires experimentation before finalizing. It demands verification before declaring success. This structure reduces hallucination. It increases accountability.

I will check the code. I will run the tests. I will validate the benchmarks. The AI builds. I verify. That division of labor leverages AI speed and human judgment. That balance might produce a working system. It might also produce a spectacular failure. Both outcomes teach me something.

Delegation is just trusting someone else to do the work while you watch nervously. I am delegating to an AI. I am watching nervously. The work is in progress.

What I Am Giving Up

I am giving up control over implementation details. I am giving up the comfort of writing every line myself. I am giving up the illusion that I understand every optimization. In return I gain speed. I gain fresh perspectives. I gain the chance to learn from an AI that does not share my blind spots.

The trade-off is intentional. The risk is acknowledged. The reward is uncertain. That uncertainty is the point. That uncertainty is the opportunity.

Final Thoughts

I deleted everything. I asked an AI to rebuild it. The prompt is long but simple. The goal is ambitious but measurable. The process is structured but flexible. I will check the code. I will validate the results. I will learn from the outcome.

If it works, I will have a hyper efficient training pipeline for sub one million parameter models. If it fails, I will have a detailed log of why. Both outcomes advance the project. Both outcomes justify the gamble.

Thank you for watching the reset. Thank you for accepting that sometimes starting over is the only way forward. Thank you for believing that tiny models can still surprise us. The rebuild is underway. The progress is weird. The hope is real.